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Scientific  progress  report 

H.  N.  Mhaskar 


1  Forward 

Many  modern  applications  require  modeling  and  analysis  of  functions  on  large,  high  dimensional,  unstructured 
data  sets.  One  may  assume  that  the  data  lies  on  a  low  dimensional  manifold,  but  this  manifold  is  not  known.  We 
have  extended  the  diffusion  geometry  paradigm  for  these  problems  to  study  function  approximation  on  data  defined 
manifolds.  Our  algorithms  are  applied  successfully  to  recognition  of  hand  written  digits,  classification  and  missing 
data  problems,  automatic  diagnosis  of  age  related  macular  disease  based  on  multi-spectral  images,  and  prediction 
of  blood  glucose  levels.  The  ideas  are  applied  to  other  problems,  such  as  analysis  of  terrain  data  and  solutions  of 
partial  differential  equations.  The  scientific  barriers  include  the  development  of  kernel  based  methods  so  as  to  avoid 
computation  of  eigenvalues  and  eigenvectors  of  large  matrices,  and  quadrature  formulas  which  are  guaranteed  to 
work  better  than  the  straightforward  Monte  Carlo  integration  method. 


2  Statement  of  Problems  Studied 

The  grant  was  a  continuation  of  our  research  on  function  approximation  on  the  Euclidean  sphere,  supported  by  the 
ARO.  During  this  project,  we  studied  a  variety  of  extensions  of  this  theory  to  the  context  of  data  defined  manifold, 
bringing  the  theory  on  such  manifolds  to  the  same  level  of  completion  as  that  on  the  sphere. 

The  main  underlying  problem  is  the  following.  One  starts  with  a  data  structure  called  point  cloud,  which  is  a 
finite  subset  V  =  {xi}^L1  of  some  high  dimensional  ambient  Euclidean  space,  together  with  a  affinity  relation  W, 
where  one  interpretes  WhJ  as  W(xi,  Xj),  indicating  how  “close”  Xi  is  to  Xj.  The  matrix  W  clearly  defines  an  undi¬ 
rected  graph,  which  can  be  embedded  into  a  low  dimensional  manifold  using  the  diffusion  geometry  paradigm;  i.e., 
one  considers  the  limit  of  the  graph  Laplacian  as  the  Laplace-Beltrami  operator  on  a  manifold,  which  has  eigenval¬ 
ues  — and  the  corresponding  eigenfunctions  (f>k ■  While  most  of  the  existing  theory  focuses  on  understanding  the 
data  geometry  and  data  visualization,  applications  to  semi-supervised  learning  can  be  cast  as  problems  of  function 
approximation.  Thus,  in  classification  problems,  one  knows  the  class  labels  on  a  small  training  data  C  C  V,  which 
may  be  viewed  as  a  function  /  :  C  — >  R.  Then  the  problem  of  semi-supervised  learning  is  to  extend  this  function 
to  V;  i.e.,  to  learn  the  class  labels  of  every  point  in  V .  The  main  questions  of  interest  to  us  were  the  following: 

1.  Study  the  connection  between  the  smoothness  properties  of  the  target  function  /  as  defined  by  the  geometry 
of  the  unknown  manifold,  and  the  intrinsic  approximation  error  that  can  be  expected  in  approximating  /  by 
a  linear  combination  of  finitely  many  eigenfunctions  <pk- 

2.  Develop  algorithms  to  compute  efficiently  a  linear  approximation  process  that  yields  a  near  best  approxima¬ 
tion. 

The  ideas  developed  during  this  work  found  applications  in  some  other  areas  as  well,  such  as  solutions  of  partial 
differential  equations,  image  processing,  and  prediction  of  blood  glucose  levels  in  diabetes  patients. 

2.1  Scientific  barriers 

•  The  target  function  is  defined  on  a  manifold,  or  even  a  graph,  with  no  known  structure.  The  only  information 
available  is  the  unstructured  data. 

•  The  eigenfunctions  of  the  heat  kernel  do  not  have  any  such  special  function  properties  as  the  Funk-Hecke 
formula,  addition  formula,  etc.  familiar  in  the  classical  theory. 

•  Since  the  heat  kernel  is  the  only  object  that  can  be  constructed  approximately,  the  conditions  must  be 
formulated  in  terms  of  this  kernel. 

•  The  existence  of  quadrature  formulas  is  not  clear,  since  the  underlying  manifold  and  its  Riemannian  measure 
are  both  unknown.  The  classical  tools  like  the  Bernstein  inequality  are  not  available  in  this  setting,  and  must 
be  developed  new. 

•  The  data  may  be  nominally  high  dimensional,  posing  formidable  numerical  problems. 
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2.2  Our  approach 

•  Consider  the  data  to  be  a  sample  from  an  unknown  manifold. 

•  Approximate  the  heat  kernel  on  this  manifold  using  a  graph  Laplacian,  constructed  from  the  data.  The 
eigenfunctions  of  this  kernel  form  the  class  of  approximants. 

•  Develop  a  filter  which  yields  a  highly  localized  modifier,  and  which  is  efficient  to  apply.  This  yields  local 
approximation  given  Fourier  information. 

•  Develop  quadrature  formulas  exact  for  high  complexity  approximants,  based  on  scattered  data,  and  use  these 
to  convert  the  filter  approximation  to  a  kernel  based  approximation  using  the  available  data. 


3  Summary  of  results 

The  research  resulted  in  17  publications  and  a  number  of  colloquium  and  conference  presentations. 

The  paper  [19]  is  an  invited  survey  paper,  where  the  basic  ideas  behind  the  research  are  illustrated  in  the  context 
of  multivariate  trigonometric  polynomials. 

The  papers  [9,  10]  deal  with  the  question  of  developing  quadrature  formulas  that  enable  us  to  discretize  various 
integral  operators  in  the  theoy  of  diffusion  geometry  while  keeping  track  of  the  errors,  so  that  the  net  effect  on  the 
accuracy  of  function  approximation  is  not  affected  significantly.  An  important,  but  special  case,  of  this  theory  was 
developed  in  the  case  of  quadrature  formulas  for  spherical  triangles  in  [1], 

A  further  extension  of  this  work,  so  that  the  function  approximation  is  achieved  so  as  to  preserve  the  known 
values  of  the  target  function  at  certain  landmarks  on  the  manifold,  is  studied  in  [6]. 

The  papers  [15,  16]  deal  with  the  question  of  function  approximation  without  computing  the  eigenvalues  and 
eigenfunctions  explicitly,  even  though  the  smoothness  of  the  target  function  is  related  intimately  with  the  spaces 
spanned  by  these  eigenfunctions.  In  particular,  the  paper  [15]  generalizes  the  results  in  [17]  for  radial  basis  function 
approximation  on  the  sphere. 

The  paper  [20]  is  an  offshoot  of  the  ideas  in  this  research  in  the  context  of  expansions  of  functions  in  terms  of 
Jacobi  polynomials. 

In  the  remaining  publications,  we  focused  on  applications  to  various  areas. 

The  paper  [5]  deals  with  image  completion  problems.  We  study  the  question  of  contextual  recovery  of  missing 
data  while  preserving  a  certain  number  of  normal  derivatives  at  the  boundary.  The  classical  approach  to  this 
problem  is  to  solve  a  differential  equation.  We  have  demonstrated  both  the  applicability  and  limitations  of  this 
approach  in  its  full  generality. 

The  paper  [3]  deals  with  the  question  of  function  extension  from  a  set  of  points  on  the  torus  to  the  whole  torus. 
Unlike  the  rest  of  our  research,  the  points  are  not  dense  on  the  torus,  but  the  extension  is  to  minimize  a  Sobolev 
norm.  In  particular,  we  are  able  to  overcome  Runge’s  example. 

The  theory  in  [3]  is  applied  to  the  solutions  of  partial  differential  equations  in  [2].  Our  method  outperforms 
standard  packages  like  dealii. 

In  [4],  we  use  the  ideas  in  [3]  for  image  segmentation  problems.  Figure  1  illustrates  some  of  the  results. 


Figure  1:  The  figures  from  left  to  right:  The  original  cameraman  image,  the  segmentation  of  the  cameraman  image, 
the  original  mandrill  image,  the  segmentation  of  the  mandrill  image. 
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In  [11],  we  consider  the  problem  of  signal  separation  in  stationary  signals;  i.e. ,  given  the  samples 

K 

x(k)  =  a,j  exp(-iuijk)  +  noise,  k  =  —N,  ■  ■  ■  ,N, 
j=l 

for  sufficiently  large  value  of  N,  we  wish  to  find  ujj,  j  =  1,  •  •  •  ,  K.  Once  we  find  uij  accurately,  it  is  simple  solution 
of  a  linear  system  of  equations  to  find  aj.  In  [11],  we  used  the  theory  of  orthogonal  polynomials  on  the  unit  circle 
together  with  our  earlier  theory  of  trigonometric  polynomial  frames  to  solve  this  problem  very  accurately.  We  gave 
theoretical  bounds  on  the  effect  of  noise,  without  assuming  any  special  distribution  of  the  noise. 

For  example,  we  consider 

x(k)  =  34  +  300cos(fc7r/4)  +  cos(£;7r/2)  +  e(k),  fc  =  — 1024,  •••  ,1024, 

where  e(/c)  is  a  random  variable  uniformly  distributed  in  the  range  [—3, 3].  Thus,  in  addition  to  the  large  differences 
in  the  magnitudes  at  different  frequencies,  also  the  noise  is  three  times  the  strength  of  the  weakest  signal  at  ir/2. 
As  an  average  over  500  trials,  the  frequencies  were  recoverd  as 

(— 3.992361996552509e  -  16,  ±0.785398165178676,  ±1.570153903610014). 

In  particular,  the  weakest  frequency  7r/2  was  detected  with  an  accuracy  of  6.4242e  —  4  in  spite  of  the  noise  being  3 
times  the  strength  of  the  corresponding  signal. 

In  [7,  8],  we  apply  our  theory  for  biomedical  applications.  We  analyzed  the  Cleveland  heart  disease  data  set  to 
determine  the  stage  of  the  heart  disease  of  a  patient,  based  on  13  attributes,  and  also,  a  variant  of  the  Wisconsin 
breast  cancer  data  set  where,  instead  of  the  classification  problem,  we  omitted  one  of  the  independent  variables, 
and  treated  the  problem  as  that  of  missing  data  recovery.  In  each  of  these  examples,  we  outperformed  state  of 
the  art  Support  Vector  Machine  algorithms.  A  novelty  of  these  papers  is  the  classification  of  drusen  in  retina  of 
patients  with  age  related  macular  disease.  The  classification  was  based  on  a  first  of  its  kind  data  set  obtained  by  the 
National  Institute  of  Health.  Each  image  in  this  data  set  was  a  multi-spectral  image  with  24  different  frequencies. 
Our  methods  gave  an  automatic  prognosis,  as  illustrated  in  Figure  2. 


Figure  2:  The  left  two  images  are  at  different  frequencies  for  the  retina  of  a  patient  with  advanced  AMD,  the 
right -most  image  shows  our  classification  of  drusen  based  on  24  such  images  at  different  frequencies. 

In  [18],  we  applied  the  ideas  in  our  research  for  the  prediction  of  blood  glucose  level  and  rate  of  change  of  this 
level  in  a  15  minute  prediction  horizon,  based  on  continuous  glucose  monitoring  (CGM)  device  readings  during  the 
past  half  hour.  To  quantify  the  clinical  accuracy  of  the  considered  predictors,  we  use  the  Prediction  Error-Grid 
Analysis  (PRED-EGA)  [21],  which  has  been  designed  especially  for  the  blood  glucose  predictors.  This  assessment 
methodology  records  reference  glucose  estimates  paired  with  the  estimates  predicted  for  the  same  moments.  As  a 
result,  the  PRED-EGA  reports  the  numbers  (in  percent)  of  Accurate  (Acc.),  Benign  (Benign)  and  Erroneous  (Er¬ 
ror)  predictions  in  hypoglycemic  (0-70  mg/dL),  euglycemic  (70-180  rng/dL)  and  hyperglycemic  (180-450  mg/dL) 
ranges.  This  stratification  is  of  great  importance  because  consequences  caused  by  a  prediction  error  in  the  hypo¬ 
glycemic  range  are  very  different  from  ones  in  the  euglycemic  range.  In  [18]  the  assessment  is  done  with  respect 
to  the  references  given  as  simulated  noise-free  BG-readings.  On  a  data  set  of  10  virtual  patients  obtained  from 
Padova/ Uni  versify  of  Virginia  simulator  [13],  we  obtained  the  following  representative  results  shown  in  Table  1, 
where  we  compare  our  results  with  those  obtained  by  using  the  state  of  the  art  Modified  Savitzky-Golay  Filter¬ 
ing  (MSGF)  [12,  14].  Remarkably,  in  the  hyperglycemic  region,  this  research  has  achieved  100%  accuracy,  while 
the  previously  known  academic  record  is  91.43%.  Also,  in  the  euglycemic  region,  the  percentage  of  benign  and 
dangerous  errors  is  reduced  50%. 
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Acc. 

Benign 

Error 

Acc. 

Benign 

Error 

Acc. 

Benign 

MSGF 

99.78 

0.22 

- 

98.98 

0.93 

0.091 

91.43 

8.57 

Our  method 

99.74 

0.26 

- 

99.40 

0.55 

0.05 

100 

- 

Table  1:  Percentage  of  the  accurate,  benign,  and  erroneous  predictions  in  hypoglycemic  (columns  2-4),  euglycemic 
(columns  5-7),  and  hyperglycemic  (columns  8-9)  regions  respectively. 
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